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Abstract — In this paper, we propose a distributed reinforce- 
ment learning (RL) technique called distributed power control 
using Q-learning (DPC-Q) to manage the interference caused 
by the femtocells on macro-users in the downlink. The DPC-Q 
leverages Q-Leaming to identify the sub-optimal pattern of power 
allocation, which strives to maximize femtocell capacity, while 
guaranteeing macrocell capacity level in an underlay cognitive 
setting. We propose two different approaches for the DPC-Q 
algorithm: namely, independent, and cooperative. In the former, 
femtocells learn independently from each other while in the latter, 
femtocells share some information during learning in order to 
enhance their performance. Simulation results show that the 
independent approach is capable of mitigating the interference 
generated by the femtocells on macro-users. Moreover, the results 
show that cooperation enhances the performance of the femtocells 
in terms of speed of convergence, fairness and aggregate femtocell 
capacity. 

I. Introduction 

Femtocells are considered to be a highly promising solution 
for the enhancement of the indoor coverage problem. However, 
femtocells are deployed unpredictably in the macrocell area. 
Thus, their interference on macro-users and other femtocells 
is considered to be a daunting problem H], (|2]. 

Since femtocells are installed by the end user, their number 
and positions are random and unknown to the network opera- 
tor This makes the centralized approach for solving the inter- 
ference problem very hard due to the huge overhead needed 
which in turn calls for a distributed interference management 
strategy. In the distributed scheme, each femtocell needs to 
learn how to interact with the dynamic environment created by 
the coexistence of the femto and macro cells in order to adjust 
its parameters (carrier frequency and transmission power) to 
satisfy the QoS of its own users while guaranteeing certain 
QoS for the macrocell users. 

Based on these observations, in this paper we focus on 
closed access femtocells |3] working in the same band- 
width with macrocells (cognitive femtocells). We will use a 
distributed machine learning technique called reinforcement 
learning (RL) pl to handle the interference problem generated 
by the femtocells on the macrocells' users. One of the most 
popular RL techniques is Q-learning ||5]. The reason we chose 
Q-learning is because it finds optimal decision policies without 
any prior model of the environment (in our settings, a prior 
model can not be achieved due to the unplanned placement of 
the femtocells). Moreover, Q-leaming allows the agents (i.e 



the femtocells) to take actions while they are learning (i.e 
no need for a centralized approach). These features make Q- 
learning very suitable to be applied to the distributed femtocell 
setting in the form of the so called multi-agent Q-leaming 
(MAQL) 1(6]. In this paper, MAQL is applied in two different 
paradigms: independent learning (IL) and cooperative leaming 
(CL). The former assumes that agents are unaware of the 
other agents' actions while the latter allows the agents to 
share some knowledge while they are leaming to enhance their 
performance JS], Q. 

In literature, RL has been used to perform power allocation 
in wireless networks. In ||8], ||9], authors used IL Q-learning 
to perform power allocation in order to control the aggregate 
interference generated by multiple secondary users on the 
primary receiver of a digital TV (DTV) system. In ifTOl . 
authors addressed the same goal of interference control but 
in the context of OFDMA-based femtocells. In ifTTIl . authors 
used IL Q-learning in the context of cognitive femtocells and 
introduced a new concept called docitive femtocells. However, 
all the papers discussed above were interested in maintaining 
the QoS of the primary users and ignored the QoS of the 
femtocells (e.g: fairness, maximizing the femtocell capacity). 
Moreover, they all used the IL paradigm and did not take into 
consideration any cooperation between the agents (femtocells) 
during the leaming process. 

Motivated by this, in this paper we apply Q-leaming for 
power control in closed access cognitive femtocells network. 
The contributions of this paper can be summed up as follows: 

* A distributed algorithm based on IL paradigm is used to 
handle the interference problem. A new reward function 
is introduced and compared to the reward function used 
in literature ifTO) . ifTTl . The comparison is applied in two 
different scenarios: 

1) Maintaining the QoS (i.e. the capacity) of the macro- 
cell without taking into consideration the QoS of the 
femtocells. 

2) Enhancing the capacity of the femtocells while main- 
taining the QoS of the macrocell. 

• Cooperation between the femtocells is introduced to 
enhance the aggregate capacity and fairness amongst all 
the femtocells, while maintaining the macrocell QoS. 

The remaining part of this paper is organized as follows. 



Section |II] gives a brief background for the original single 
agent Q-leaming. In section|IIIl the system model is described. 
Section |IV] introduces the proposed distributed Q-learning 
algorithm and the Q-learning formulation for the cognitive 
femtocells problem. The simulation scenario and the results 
are discussed in section [V] Finally the conclusion is given in 
section |Vll 

II. Background: Single Agent Q-learning (SAQL) 

In this section, the idea of Q-learning is presented by 
introducing the single agent case Q. The Q-learning model 
can be defined by the tuple {S,A,R{s,a)} where S = 
{si,S2,--- ,s„i} is the set of possible states the agent can 
occupy, A = {ai, 02, • • • , a;} is the set of possible actions the 
agent can perform and R{s, a) is the reward function that de- 
termines the reward fed back to the agent by the environment 
when performing action a in state s. The interaction between 
the agent and the environment at time t can be described as 
follows: 

• The agent senses the environment and observes its current 
state St G S. 

• Based on st, the agent selects action at G A. 

• Based on at, the environment makes a transition to a 
new state St+i G S and as a result achieves a reward 
rt — R{st,at) due to this transition. 

• The reward is fed back to the agent and the process is 
repeated. 

The end goal of the agent is to find an optimal policy n* (s), 
which defines the action to be selected for each state s G S" 
in order to maximize the expected discounted reward over an 
infinite time: 



a new value called Q-value is defined for each state-action 
pair, where the optimal Q-value is defined as: 



V^is) ^E{J2lHst,7Tis))\so ^ s} 



(1) 



where V^{s) is the value function of state s which repre- 
sents the expected discounted infinite reward when the initial 
state is So and < 7 < 1 is the discount factor that determines 
how much effect future rewards have on the decisions at each 
moment. Furthermore, equation ([T) can be expressed as llTOl : 

V^s) = E{r(s, nis))} + 7 ^ P^y (^(s))F"(s') (2) 

s'es 

where s is the new state to which the environment transits 
after taking action a = it{s) and P^ ^' is the transition 
probability from state s to state s after performing action 
a = 7r(.s). From equation (|2]i, the optimal value function V*{s) 
can be written as: 



Q*{s,a) =E{r(s,a)}+7 



1. Ps.s 

s'es 



(a) max Q*(s ,6) (4) 

beA 



Equation (|4]i states that the optimal value function can be 
expressed by V*{s) = maxaeA Q*{s, a). Thus, if the optimal 
Q-value is known for each state-action pair, the optimal policy 
can be determined by tt*{s) = arg maXaeA Q*{s, a). The Q- 
learning algorithm finds Q*{s, a) in a recursive manner using 
a simple update rule: 



Q(s,a) := (1 — a)Q(s, a) +a(r(s,a) +jTna.xQ(s , 6)) (5) 

beA 

Where a is the learning rate. It was proved in ||5], IIT2I 
that this update rule converges to the optimal Q-value under 
certain conditions. One of these conditions is that each state- 
action pair must be visited infinitely often Q. To address 
this notion, a random number e is introduced where at each 
step of the learning process the action is chosen according to 
a = arg maxa^A Q{s, a) with probability (1 — e) or randomly 
with probability e. Moreover, in the convergence proof, the 
reward function is assumed to be bounded and deterministic 
for each state-action pair Iil2'i . However, in the multi-agent 
case, this condition is violated since the reward for each state 
will depend on the joint action of all agents, hence the reward 
function will not be deterministic from the agent point of view. 
Thus, in section [V] the effect of choosing the reward function 
will be addressed using simulations. 

III. System Model 

In this paper, a wireless network consisting of one macro 
cell with one single transmit and receive antenna denoted by 
Macro Base Station (MBS) underlaid with Nfemto femtocells 
each with one Femto Base Station (FBS) is considered. Um 
and Uf macro and femto users are located randomly inside 
the macro and femto cells respectively. Both MBS and FBS's 
transmit over the same Ngub sub-carriers where orthogonal 
downlink transmission is assumed in each time slot. 

The transmission powers of the MBS and FBS i in subcar- 

(n) (n) 

rier n are denoted by Po and P,^ respectively. Moreover, 
the maximum transmission powers for the MBS and FBS i 
are P.J^ax and F,{„^ respectively, where J^n^t Po""^ ^ P^ax 
and V^^"" P'"^ < Pf 

The system performance is analyzed in terms of the capacity 
measured in (bits/sec/Hz). The capacity achieved by the MBS 
at its associated user in subcarrier n is: 



\/*(s)=max(E{r(s,a)}+7E^...'(«)^*(*')) (3) 

s'gS 

Q-learning aims at finding the optimal policy 7r*(s) that 
corresponds to V*{s) without having any prior knowledge 
about the transition probabilities P^ ^1 . In order to do this. 
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(6) 



where hoo denotes the channel gain between the MBS and 

in) 

its associated user in subcarrier n; hiJ denotes the channel 
gain between FBS i and the macro user in subcarrier n and 



a^ is the noise power. The capacity achieved by FBS i at its 



associated user in subcarrier n is: 



c(")=log,(l + 



,(") 



^(")p(n) 
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,^(n)pH+^2 



) (7) 



where h}, denotes the channel gain between FBS i and its 

in) 

associated user in subcarrier n; h-^ denotes the channel gain 
between FBS j and the femto user associated withe FBS i in 
subcarrier n. 

IV. Distributed Power Control using Q-learning 
(DPC-Q) 

In this section, a distributed MAQL technique called DPC- 
Q is presented where multiple agents (i.e: femtocells) aim at 
learning a sub-optimal decision policy (i.e: power allocation) 
by repeatedly interacting with the environment. First we 
describe the two different paradigms in which the proposed 
DPC-Q algorithm is applied: Independent learning (IL) and 
Cooperative learning (CL). Then, the agents, states, actions 
and reward functions used during the simulations will be 
introduced. 

• Independent learning (IL): In this paradigm, each agent 
learns independently from other agents (i.e: ignores other 
agents' actions and considers other agents as part of the 
environment). Although, this may lead to oscillations and 
convergence problems, the IL paradigm showed good 
results in many applications lfT3l . The action selection 
strategy for agent i in the IL paradigm is the same as 
the SAQL case: ai = argmaxagAi Qi{si,a), where Ai 
is the set of actions available for agent i (in our settings, 
we assume that Ai is the same for all agents 1 < i < N, 
where N is the number of agents). The only difference 
here compared to the SAQL case is that the reward 
function is now dependent on the joint action of all agents 
a. Thus, the update rule can be rewritten as: 



• Cooperative learning (CL): CL is performed as follows: 
Agent i shares the row of its Q-table that corresponds 
to its current state with all other agents j. Then agent i 
selects its action according to the following equation: 



Qi{si,ai) := (l-a)(5i(sj,ai)+a(r,(s,,a)+7max(3j(Si, 

beAi 

(8) 
However, in the multi-agent case, acting in an indepen- 
dent way is not always the best approach because agents 
now affect each other in terms of the reward function 
as shown in equation (O. So, agents will need to know 
some information about each other (e.g: states, action, 
Q-tables,- • •,etc). This information is shared during the 
learning process in order to enhance the agents' per- 
formance. Motivated by this, we propose a mechanism 
where each agent shares a portion of its Q-table with all 
other agents QThe Q-table is a table with l^jxIAj entries 
where \S\ and \A\ are the total number of possible states 
and actions respectively). 

We assume that the shared portion of the Q-table is put in the control bits 
of the packets transmitted between the femtocells. The details of the exact 
protocol lie out of the scope of this paper. 



= arg max( 



J2 '3j(*J' 



a)) 



(9) 



The main idea behind this strategy depends on two impor- 
tant observations: 1) the meaning of the Q-value Q{s, a), 
which is an estimate of the value of future rewards if 
the agent selects action a in state s. For example, if the 
reward of a femtocell is its capacity, then at a certain 
instant, if the agent was in state s, has two actions al, 
a2 and Q{s, al) > Q{s, a2), then choosing al in state s 
would achieve higher femtocell capacity than a2. 2) The 
definition of the global Q-value Q(s,a), which represents 
the Q-value of the whole system (i.e. if the multi-agent 
scenario is transformed into a single agent one using a 
centralized controller with global state s and global joint 
action a). This global Q-value can be decomposed into 
a linear combination of local agent-dependent Q-values: 



Q(s,a) = El 



<j<N ^jVj^'^j 



is, 



Il4i . Thus, if each agent 



j maximized its own Q-value, the global Q-value will be 
maximized. Based on these two observations, choosing 
the action based on equation |9] would maximize the 
global Q-value. However, the solution is still not global 
optimum because based on equation |9] all agents will 
choose the same action. For example, if there are two 
agents (femtocells) 1 and 2, each agent has one state s 
and three actions al, a2 and a3, the reward for each agent 
is its capacity and the Q-values for both agents are as 
follows: Qi{s,al) = 1, Qi{s,a2) ^ 2, Qi{s,a3) == 3, 
Q2(s,al) = 4, Q2{s,a2) ^ 6 and Q2(s,a3) = 4.5, 
then in the IL paradigm, agent 1 will choose action a3, 
thus maximizing its capacity, while agent 2 will choose 
action a2, thus maximizing its capacity. However, in the 
CL paradigm, both agents will choose action a2 (the 
maximum of the summation of the Q-values is 2 + 6), 
b)) thus maximizing the aggregate capacity. 

In terms of overhead, according to our proposed cooper- 
ation algorithm each femtocell should only share a row 
of its Q-table with all its neighbors. This row has a size 
of 1x|j4|. So if the number of femtocells is N femto, then 
the total overhead needed is N femto -{N femto — 1)-|^| 
per unit time. 

Finally, it should be noticed that we assume that the 
information to be shared is put in the control bits in the 
packets transmitted between the femtocells. The different 
paradigms of the DPC-Q algorithm are summarized in 
algorithm [T| 

The agents, states, actions and reward function are defined 
as follows: 

. Agent: FB5„V1 < i < N femto 

• State: At time instant t for femtocell i in subcarrier n, 
the state is defined as: sj'" = {11', P]} where Jj" 6 {0, 1} 



Algorithm 1 The proposed DPC-Q algorithm 

Let t = 0, (5^(s,,, Oj) = for all Si ^ A and a; e A 

Initialize the starting state s\ 

loop 

send Q\{s\, :) to all other agents j 

receive Qj(Sjj from all other agents j 

if rand < e then 

select action randomly 
else 

if leaning paradigm == IL then 

choose action: a\ = arg max^ Qi{s\, a) 

else 

choose action: a* = argT[ia^a{J2i<j<N Q^ji^]^"-)) 
end if 
end if 

receive reward r* 
observe next state s*''"^ 
update Q-table as in equation |8] 

4 = sl^' 

end loop 



indicates the level of interference measured at the macro- 
user in subcarrier n at time t: 



It^ 



0, d"' > r° 



(10) 



where T° is the target capacity determining the QoS per- 
formance of the macrocell. We assume that the macrocell 
reports the value of C" to all FBS through the backhaul 
connection. 

P\ determines the total power FBS i is transmitting with 
at time t: 



The rationale behind this reward function is to maintain 
the capacity of the macrocell at the target capacity T° 
while not exceeding the allowed Pj^ax- The reason 
for the small difference between the positive (when 
Prnax is not exceeded) and negative (when P/^^-^ is 
exceeded) rewards is due to the way the states are 
defined. Since the state s^'" is defined as {/",PJ} 
and Pi is defined for certain ranges of powers not 
for discrete power levels, therefore, large negative 
numbers can not be assigned as a reward when Pj^^a; 
is exceeded. For example, if /" = 1 and P] = Q 
dBm, then FBS i is in state {1,0} in subcarrier n. 



If FBS i took the action a. 



dBm, then the 



next state would be {1,1} and FBS i is rewarded 
positively according to equation [T2] Now consider the 
case when /" = 1 and P] = 9 dBm, then FBS i is 
again in state {1,0} in subcarrier n. If FBS i took 
the same action aj'" = 8 dBm, then the next state 
would {1, 2} and FBS i is rewarded —1. So from this 
example, it can be shown that different rewards could 
be assigned for the same state-action pair Thus, the 
difference between these different rewards must not 
be large. If the state was defined for discrete power 
levels (e.g: PJ — J2n=i vV^)^ then it would be possible 
to assign rewards with large differences because the 
case of having different rewards for the same state- 
action pair will not occur However, defining the states 
in a discrete manner would dramatically increase the 
number of possible states which in turn makes it harder 
to satisfy the condition of visiting all the state-action 
pairs infinitely many times. Based on this observation, 
in the next section we compare our reward function to 
the reward function used in lITOl: 



0, E:io>r<(^,L.-^i) 
^i = <!i, {pLx-m<Y!l'avr<pLx (11) 

2 y^JV,„b i,n pf 

where P^^^^, Al and A2 are set to 15, 5 and 5 dBm 
respectively in the simulations and pj'" is the power 
femtocell i transmitting with on subcarrier n at time t. It 
should be noticed that other values for Al and A2 as well 
as more power levels were tried through the simulations 
and the performance gain was marginal. 
Action: The set of actions for each agent is the set of 
possible powers that the FBS can use. In the simulations 
a range from —20 to 15 dBm with step of 2 dBm is used. 
Reward: Two different reward functions were consid- 
ered in the simulations: 

1) 

^N,^i i,n ^ pf 
L-in=Q ^t — max /io\ 
y-W... ^,n pf (^^) 

Z^n=0 ^t max 




i,n _ J -"^ y^° ^ I ' 2^n=0 n — ^max 

t ) Q y-W.„i, i,n pf 

I '-'i Z^ri=0 ^t ^ ^max 

(13) 
where K is a constant value. We will show that our 
reward function improves the convergence compared 
to the reward function proposed in the literature. Note 
that the authors in ifTOl defined the state for discrete 
power levels and this proves our point. 



2) 




n=Q Pt 



— -* mc 

>pL 

(14) 

The reward function defined by equation (fT2] i does not 
take into consideration the femtocell capacity. Thus, we 
define the above reward function with the rationale of 
maximizing the femtocell capacity while maintaining 
the macrocell capacity at r°. 



V. Performance Evaluation 

A. Simulation Scenario 

We consider a wireless network consisting of one macrocell 
underlaid with Nfemto femtocells. In the simulations, Nfemto 
ranges from 4 to 15 femtocells. Each femtocell serves Uf = 1 
femto-user which is randomly located in the femtocell cov- 
erage area. Both the macro and femto cells share the same 
frequency band composed of Ng^b = 6 subcarriers where 
orthogonal downlink transmission is assumed. The channel 
gain between transmitter i and receiver j on subcarrier n is 
assumed to be path-loss dominated and is given by: 



h 



(n) _ ,(-fc) 



(15) 



Where dij is the physical distance between transmitter i and 
receiver j, and k is the path loss exponent. In the simulation 
fc = 2 is used. The distances are calculated according to the 
following assumptions: 

• The maximum distance between the MBS and its asso- 
ciated user is set to 1000 meters. 

• The maximum distance between the MBS and a femto- 
user is set to 800 meters. 

• The maximum distance between a FBS and its associated 
user is set to 80 meters. 

• The maximum distance between a FBS and another 
femtocell's user is set to 300 meters. 

• The maximum distance between a FBS and the macro- 
user is set to 800 meters. 

We used MatLab to simulate such scenario, where in the 
simulations we set the noise power a^ to 10"^, the maximum 
transmission power of the macrocell P^^ix ^'^ 43 dBm, the 
learning rate a to 0.5, the discounted rate 7 to 0.9 and the 
random number e to 0.1 during the first 80% of the Q-iterations 
ID, a and d. 

B. Numerical Results 

We will refer to the reward functions defined by equations 
(O, (O and (O as RFl, RF2 and RF3 respectively in 
all the simulations. Figure ([T]i shows the convergence of the 
macrocell capacity on a certain subcarrier (Co ) using RFl 
and RF2 with K = 80, K ^ 1000 and K = 10000. It 
can be observed that RFl shows better convergence behavior 
than RF2 with both values of K (i.e: RFl converges to the 
target capacity {To = 6) more accurately). Moreover, the figure 
shows that the value of K affects the convergence where K = 
80 is better than K = 1000 and K = 1000 is better than K = 
10000, which proves our point that as the difference between 
the positive and negative rewards decreases, the convergence 
is enhanced. Note that in the simulations, the number of Q- 
iterations was 3000 while in the figure only 300 iterations are 
shown (i.e: The figure is drawn with step = 10) in order to 
achieve better resolution. 

Figure (|2|i shows the total femtocell capacity using RFl, 
RF2 with K ^ 80 and RF3 in the IL paradigm. It can be 
observed that introducing C}" in RF3 increases the aggregate 




Fig. 1. Convergence of the macrocell capacity using different reward 



functions with N 



femto ' 



■ 4 with target capacity = 6. 




of Femtocells 



Fig. 2. Total femtocell capacity as a function of the number of femtocells 
using RFl, RF2 with K = SO and RF3 in the IL paradigm. 



femtocell capacity compared to RFl. However, since the 
IL paradigm is used here, the femtocells act in a selfish 
way, which may reduce the fairness (in terms of capacity) 
between the femtocells compared to RFl. This is shown in 
figure (|3]l. Note that the fairness is evaluated using Jain's 



(ELi ^^r 



where 



fairness index Qs): f{xi,X2,--- ,Xn) = „^,^2 ^a 

< f{xi,X2, ■ ■ ■ ,Xn) < 1 and the equality to 1 occurs when 

all the femtocells achieve the same capacity. 

As for the cooperation effect, figure (|4) shows the total fem- 
tocell capacity using RFl in the IL paradigm and RF3 in both 
IL and CL paradigms. From the figure, it can be noticed that 
introducing cooperation increases the total femtocell capacity. 
Actually, it can be observed that at N femto — H cooperation 
increased the capacity by around 2.6 bits/sec/Hz. Figure (|5]l 
shows that cooperation not only increases the capacity but 
also enhances the fairness. Moreover, figure (|6) shows that 
cooperation speeds up the convergence (In the CL paradigm, 
convergence almost started after 1800 iterations). 

VI. Conclusion and Future work 

In this paper, a distributed Q-leaming algorithm based on 
the multi-agent systems theory called DPC-Q is presented 
to perform power allocation in cognitive femtocells network. 
The DPC-Q algorithm is applied in two different paradigms: 
independent and cooperative. In the independent paradigm, 
two scenarios were considered. The first scenario is to control 
the interference generated by the femtocells on the macro- 
user where the results showed that the proposed algorithm 
is capable of maintaining the capacity of the macro-user at 



RF1 using IL paradigrr 

— — RF2 with K = SO using 
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Fig. 3. Jain's fairness index (in terms of capacity) as a function of the number 
of femtoceils RFl, RF2 with K = SO and RF3 in the IL paradigm. 
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Fig. 4. Total femtocell capacity as a function of the number of femtoceils 
using RFl in the IL paradigm and RF3 in both the IL and CL paradigms. 
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Fig. 5. Jain's fairness index (in terms of capacity) as a function of the 
number of femtoceils using RFl in the IL paradigm and RF3 in both the 
IL and CL paradigms. 
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Fig. 6. Convergence of the macrocell capacity using RFl in the IL and 
CL paradigms and RF3 in the IL paradigm with Nf^^t^ = 4 and target 
capacity = 6. 



a certain threshold. The second scenario is to enhance the 
aggregate capacity of femtoceils while maintaining the QoS 



of the macro-user Through simulations, we showed that the 
independent learning paradigm can be used to increase the 
aggregate femtocell capacity. However, due to the selfishness 
of the femtoceils, fairness is reduced compared to the first 
scenario. Thus, we proposed a cooperative paradigm, in which, 
femtoceils share a portion of their Q-tables with each other. 
Simulation results showed that cooperation is capable of 
increasing the aggregate femtocell capacity and enhancing 
the fairness compared to the independent paradigm, with a 
relatively small overhead. Future works will focus on: 1) 
Devise a numerical framework to study the effect of changing 
the Q-learning parameters (i.e: 7, e and a) on the performance 
of the proposed algorithm 2) Design a control protocol to 
exchange the cooperation information amongst all femtoceils 
3) Other techniques for cooperation 4) Extending cooperation 
to coordination in which the femtoceils try to coordinate their 
actions with each other to achieve the optimum joint action. 
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